Fake News Detection

🔍 Overview

This project addresses the rising challenge of online misinformation by building machine learning models to identify fake news articles. We used NLP techniques and multiple classifiers to predict whether a news article is real or fake based solely on its content.

🧭 Approach

We collected and cleaned a labeled dataset of news headlines and bodies. The data was vectorized using TF-IDF and passed through various supervised learning models to evaluate performance in classifying articles.

⚙️ Methodologies

Text Cleaning: Removed punctuation, stopwords, and special characters using regex
TF-IDF Vectorization: Converted text to numerical vectors
Modeling: Trained Logistic Regression, Naive Bayes, Decision Tree, SVM, Random Forest, and Gradient Boosting
Evaluation: Accuracy, confusion matrix, precision, recall, F1-score

🧰 Technologies

Language: Python
Libraries: Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
NLP Techniques: TF-IDF, text preprocessing
Models: Logistic Regression, Naive Bayes, SVM, Random Forest, Gradient Boosting

💡 Key Learnings

Gained hands-on experience applying NLP techniques to real-world data
Learned how to evaluate model performance for high-stakes classification problems
Understood how different algorithms perform on imbalanced text datasets

📈 Results